System and method for wearable medical sensor and neural network based diabetes analysis

ABSTRACT

According to various embodiments, a machine-learning based system for diabetes analysis is disclosed. The system includes one or more processors configured to interact with a plurality of wearable medical sensors (WMSs). The processors are configured to receive physiological data from the WMSs and demographic data from a user interface. The processors are further configured to train at least one neural network based on a grow-and-prune paradigm to generate at least one diabetes inference model. The neural network grows at least one of connections and neurons based on gradient information and prunes away at least one of connections and neurons based on magnitude information. The processors are also configured to output a diabetes-based decision by inputting the received physiological data and demographic data into the generated diabetes inference model.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to provisional applications 62/862,354,filed Jun. 17, 2019, which is herein incorporated by reference in itsentirety.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

This invention was made with government support under Grant No.CNS-1617640 awarded by the National Science Foundation. The governmenthas certain rights in the invention.

FIELD OF THE INVENTION

The present invention relates generally to wearable medical sensors andneural networks and, more particularly, to a system and method fordiagnosis and monitoring of diabetes based on wearable medical sensordata and neural network processing that bypasses feature extraction.

BACKGROUND OF THE INVENTION

More than 422 million people around the world (more than 24 million inthe U.S. alone) suffer from diabetes. This chronic disease imposes asubstantial economic burden on both the patient and the government, andaccounts for nearly 25% of the entire healthcare expenditure in the U.S.However, diabetes prevention, care, and especially early diagnosis arestill fairly challenging given that the disease usually develops andgets treated outside a clinic, hence out of reach of advanced clinicalcare. It is estimated that more than 75% of the patients still remainundiagnosed. This may lead to irreversible and costly consequences. Forexample, studies have shown that the longer a person lives withundiagnosed and untreated diabetes, the worse their health outcomes arelikely to be. Without an early alarm, people with pre-diabetes, a lessintensive diabetes status that can be cured, could end up with diabetesmellitus within five years that can no longer be cured. Thus, it isurgent to develop an accessible and accurate diabetes diagnosis systemfor the daily life scenario, which can greatly improve general welfareand bend the associated healthcare expenditure downwards.

The emergence of wearable medical sensors (WMSs) points to a promisingway to address this challenge. In the past decade, advancements inlow-power sensors and signal processing techniques have led to manydisruptive WMSs. These WMSs enable a continuous sensing of physiologicalsignals during daily activities, and thus provide a powerful, yetuser-transparent, human-machine interface for tracking the user's healthstatus. Combining WMSs and machine learning brings up the possibility ofpervasive health condition tracking and disease diagnosis in a dailycontext. This approach exploits the superior knowledge distillationcapability of machine learning to extract medical insights fromhealth-related physiological signals. Hence, it offers a promisingmethod to bridge the information gap that currently separates theclinical and daily domains. This helps enable a unified smart healthcaresystem that serves people in both the daily and clinical scenarios.

However, disease diagnosis based on WMS data and its effectivedeployment at the edge still remain challenging. Conventional approachestypically involve feature extraction, model training, and modeldeployment. However, such an approach suffers from two major problems:

Inefficient feature extraction: Handcrafting features may requiresubstantial engineering effort and expert domain knowledge for eachtargeted disease. Searching for informative features throughtrial-and-error can be very inefficient; hence, it may not be easy toeffectively explore the available feature space. This problem isexacerbated when the feature space scales up given (i) a growing numberof available signal types from WMSs, and (ii) more than 69,000 humandiseases that need to be monitored.

Vast computation cost: Due to the presence of a vast amount offloating-point operations (FLOPs) involved in feature extraction andmodel inference, continuous health monitoring can be verycomputationally intensive, hence hard to deploy on resource-constrainedplatforms.

As such, there is a need for a disease analysis framework based on WMSdata and machine learning that addresses at least the abovedeficiencies.

SUMMARY OF THE INVENTION

According to various embodiments, a machine-learning based system fordiabetes analysis is disclosed. The system includes one or moreprocessors configured to interact with a plurality of wearable medicalsensors (WMSs). The processors are configured to receive physiologicaldata from the WMSs and demographic data from a user interface. Theprocessors are further configured to train at least one neural networkbased on a grow-and-prune paradigm to generate at least one diabetesinference model. The neural network grows at least one of connectionsand neurons based on gradient information and prunes away at least oneof connections and neurons based on magnitude information. Theprocessors are also configured to output a diabetes-based decision byinputting the received physiological data and demographic data into thegenerated diabetes inference model.

According to various embodiments, a machine-learning based method fordiabetes analysis based on one or more processors configured to interactwith a plurality of wearable medical sensors (WMSs) is disclosed. Themethod includes receiving physiological data from the WMSs anddemographic data from a user interface. The method further includestraining at least one neural network based on a grow-and-prune paradigmto generate at least one diabetes inference model. The neural networkgrows at least one of connections and neurons based on gradientinformation and prunes away at least one of connections and neuronsbased on magnitude information. The method also includes outputting adiabetes-based decision by inputting the received physiological data anddemographic data into the generated diabetes inference model.

According to various embodiments, a non-transitory computer-readablemedium having stored thereon a computer program for execution by aprocessor configured to perform a machine-learning based method fordiabetes analysis is disclosed. The method includes receivingphysiological data from one or more wearable medical sensors (WMSs) anddemographic data from a user interface. The method further includestraining at least one neural network based on a grow-and-prune paradigmto generate at least one diabetes inference model. The neural networkgrows at least one of connections and neurons based on gradientinformation and prunes away at least one of connections and neuronsbased on magnitude information. The method also includes outputting adiabetes-based decision by inputting the received physiological data anddemographic data into the generated diabetes inference model.

Various other features and advantages will be made apparent from thefollowing detailed description and the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

In order for the advantages of the invention to be readily understood, amore particular description of the invention briefly described abovewill be rendered by reference to specific embodiments that areillustrated in the appended drawings. Understanding that these drawingsdepict only exemplary embodiments of the invention and are not,therefore, to be considered to be limiting its scope, the invention willbe described and explained with additional specificity and detailthrough the use of the accompanying drawings, in which:

FIG. 1 depicts a block diagram of a system for implementing a DiabDeepframework according to an embodiment of the present invention;

FIG. 2 depicts a schematic diagram of a DiabDeep framework according toan embodiment of the present invention;

FIG. 3 depicts a schematic diagram of a DiabNN architecture according toan embodiment of the present invention;

FIG. 4 depicts a schematic diagram of an H-LSTM cell according to anembodiment of the present invention;

FIG. 5 depicts a gradient based growth methodology according to anembodiment of the present invention;

FIG. 6 depicts a magnitude based pruning methodology according to anembodiment of the present invention;

FIG. 7 depicts a photo of a smartwatch and smartphone used for datacollection according to an embodiment of the present invention;

FIG. 8 depicts a table of data types collected from each participantaccording to an embodiment of the present invention;

FIG. 9 depicts a table of a DiabNN-server confusion matrix for binaryclassification according to an embodiment of the present invention;

FIG. 10 depicts a table of a DiabNN-server confusion matrix forthree-class classification according to an embodiment of the presentinvention;

FIG. 11 depicts a table of a DiabNN-edge confusion matrix for binaryclassification according to an embodiment of the present invention;

FIG. 12 depicts a table of a DiabNN-edge confusion matrix forthree-class classification according to an embodiment of the presentinvention;

FIG. 13 depicts a table of performance comparison between DiabNN-serverand DiabNN-edge according to an embodiment of the present invention;

FIG. 14 depicts a table of inference model comparison between DiabNNsand conventional frameworks according to an embodiment of the presentinvention; and

FIG. 15 depicts a table of performance comparison between DiabDeep andprior work according to an embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

Diabetes impacts the quality of life of millions of people around theglobe. However, diabetes diagnosis is still an arduous but urgentprocess, given that this disease develops and gets treated outside theclinic. The emergence of wearable medical sensors (WMSs) and machinelearning points to a potential way forward to address this challenge.WMSs enable a continuous yet user-transparent mechanism to collect andanalyze physiological signals. However, disease diagnosis based on WMSdata and its effective deployment on resource-constrained edge devicesremain challenging due to inefficient feature extraction and vastcomputation cost.

To address these problems, generally disclosed herein are embodimentsfor a framework (referred to herein as DiabDeep) that combines efficientneural networks (called DiabNNs) with off-the-shelf WMSs for pervasivediabetes diagnosis. DiabDeep bypasses the feature extraction stage andacts directly on WMS data. It enables both an (i) accurate inference onthe server, e.g., a desktop, and (ii) efficient inference on an edgedevice, e.g., a smartphone, to obtain a balance between accuracy andefficiency based on varying resource budgets and design goals. On theresource-fertile server, sparsely connected layers are stacked todeliver high accuracy. On the resource-scarce edge device, ahidden-layer LSTM based recurrent layer is used to substantially cutdown on computation and storage costs while incurring only a minoraccuracy loss. At the core of the framework lies a grow-and-prunetraining flow: it leverages gradient-based growth and magnitude-basedpruning methodologies to enable DiabNNs to learn both weights andconnections, while improving accuracy and efficiency.

The effectiveness of DiabDeep is demonstrated through a detailedanalysis of data collected from 39 participants. For server (edge) sideinference, a 96.6% (95.5%) accuracy is achieved in classifying diabeticsagainst healthy individuals, and a 96.2% (95.3%) accuracy is achieved indistinguishing among type-1 diabetic, type-2 diabetic, and healthyindividuals. Against conventional baselines, such as support vectormachines with linear and radial basis function kernels, k-nearestneighbor, random forest, and linear ridge classifiers, DiabNNs achievehigher accuracy, while reducing the model size (floating-pointoperations) by up to 387.1×(8.1×). DiabDeep also cuts down oncomputation costs relative to conventional approaches. Therefore, theframework can be viewed as pervasive and efficient, yet very accurate.

System Overview

FIG. 1 depicts a system 10 configured to implement machine learningbased diabetes analysis from WMS data. The system 10 includes one ormore wearable medical sensors (WMSs) 12. The WMSs 12 may be connected toa diabetes analysis device 14 via a network system 16. The WMSs 12 mayalso be integrated into the device 14, in which case a network system 16is not required. The device 14 may be implemented in a variety ofconfigurations including general computing devices such as but notlimited to desktop computers, laptop computers, tablets, networkappliances, and the like. The device 14 may also be implemented as amobile device such as but not limited to a mobile phone, smart phone,smart watch, or tablet computer. Where the WMSs 12 are integrated intothe device 14, the device 14 may be implemented as one or more IoTsensors.

The device 14 includes one or more processors 14 a such as but notlimited to a central processing unit (CPU), a graphics processing unit(GPU), or a field programmable gate array (FPGA) for performing specificfunctions and memory 14 b for storing those functions. The processor 14a includes a machine learning (ML) module 18 for monitoring anddiagnosing diabetes. The ML module 18 methodology will be described ingreater detail below. It is also to be noted the training process forthe ML module 18 may be implemented in a number of configurations with avariety of processors (including but not limited to central processingunits (CPUs), graphics processing units (GPUs), and field programmablegate arrays (FPGAs)), such as servers, desktop computers, laptopcomputers, tablets, and the like.

The network system 16 may be implemented as a single network or acombination of multiple networks. Network system 16 may include but isnot limited to wireless telecommunications networks, WiFi, Bluetooth,Zigbee, or other communications networks. Network system 16 may be awired network as well.

Machine Learning for Diabetes Diagnosis

A number of studies have focused on applying machine learning todiabetes diagnosis from the clinical domain to the daily scenario.

Clinical approach: Electronic health records have been used as aninformation source for diabetes prediction and intervention. With therecent upsurge in the availability of biomedical datasets, newinformation sources have been unveiled for diabetes diagnosis, includinggene sequences and retinal images. However, these approaches are stillrestricted to the clinical domain, hence have very limited access topatient status when he/she leaves the clinic.

Daily approach: Daily glucose level detection has recently captured anincreasing amount of research attention. One stream of study hasexplored subcutaneous glucose monitoring for continuous glucose trackingin a daily scenario. However, this is an invasive approach that stillrequires a high level of compliance, relies on regular sensorreplacement (3-14 days), and impacts user experience. Recent systemshave started exploiting non-invasive WMSs to alleviate theseshortcomings. For example, one system combined machine learningensembles and non-invasive WMSs to achieve a diabetes diagnosticaccuracy of 77.6%. Another system called DeepHeart that acts on Applewatch data and patient demographics uses bidirectional LSTMs to deliveran 84.5% diagnostic accuracy. However, it relies on a small spectrum ofWMS signals that include only discrete heart rate and step countmeasurements (indirectly estimated by accelerometers andphotoplethysmography). This may lead to information loss, hence reducediagnostic capability. A third system achieved a 93.6% diagnosticaccuracy by combining convolutional neural networks (CNNs) with LSTMsand heart rate variability measurements. However, this system has torely on an electroencephalogram (ECG) data stream sampled at 500 Hz thatis not yet well supported by commercial WMSs.

Efficient Neural Networks

Two NN configurations are approached next.

Compact model architecture: One stream of research exploits the designof efficient building blocks for NN redundancy removal. For example,MobileNetV2 stacks inverted residual building blocks to effectivelyshrink its model size and reduce its FLOPs. Another uses channel shuffleoperation and depth-wise convolution to deliver model compactness. Athird proposes ShiftNet based on shiftbased modules, as opposed tospatial convolution layers, to achieve substantial computation andstorage cost reduction. Similarly, automated compact architecture designalso provides a promising solution. One configuration develops efficientperformance predictors to speed up the search process for efficient NNs.Compared to MobileNetV2 on the ImageNet dataset, the generated ChamNetsachieve up to 8.5% absolute top-1 accuracy improvement while reducinginference latency substantially.

Network compression: Compression techniques have emerged as anotherpopular direction for NN redundancy removal. The pruning methodology wasinitially demonstrated to be effective on large CNNs by reducing thenumber of parameters in AlexNet by 9× and VGG by 13× for the well-knownImageNet dataset, without any accuracy loss. Follow-up works have alsosuccessfully shown its effectiveness on recurrent NNs such as the LSTM.Network growth is a complementary method to pruning that enables asparser yet more accurate model before pruning starts. A grow-and-prunesynthesis paradigm typically reduces the number of parameters in CNNsand LSTMs by another 2×, and increases the classification accuracy. Itenables NN based inference even on Internet-of-Things (IoT) sensors. Themodel can be further compressed through low-bit quantization. Forexample, one configuration showed that a ternary representation of theweights instead of full-precision (32-bit) in ResNet-56 cansignificantly reduce memory cost while incurring only a minor accuracyloss. The quantized models offer additional speedup potential forcurrent NN accelerators.

The DiabDeep Framework

The disclosed DiabDeep framework 20 is illustrated in FIG. 2. DiabDeep20 captures both physiological 22 and demographic information 24 as datainput 26. DiabDeep 20 deploys a grow-and-prune process training paradigm28 for model training 30 to deliver two inference models, i.e.,DiabNN-server 32 and DiabNN-edge 34, that enable inference on the server36 and on the edge 38, respectively, in its deployment 40. Finally,DiabDeep 20 generates diagnosis 42 as output 44. The details of datainput 26, model training 30, and model inference 40 are as follows.

Data input 26: As mentioned earlier, DiabDeep 20 focuses on (i)physiological signals 22 and (ii) demographic information 24 that areavailable in a daily context. Physiological signals 22 can be capturedby WMSs 46 (e.g., smartphone and smartwatch) in a non-invasive, passive,and efficient manner. The list of collectible signals includes, but isnot limited to, heart rate, body temperature, Galvanic skin response,and blood volume pulse. Additional signals such as electromechanical andambient environmental data (e.g., accelerometer, gyroscope, and humiditysensor readings) may also provide information on user habit trackingthat offers diagnostic insights. This list is expanding rapidly, giventhe speed of ongoing technological advancements in this field.Demographic information 24 (e.g., age, weight, gender, and height) alsoassists with disease diagnosis. It can be easily captured and updatedthrough a simple user interface on a smartwatch or smartphone. Then,both physiological 22 and demographic data 24 are aggregated and mergedinto a comprehensive data input 26 for subsequent analysis.

Model training 30: DiabDeep 20 utilizes a grow-and-prune paradigm 28 totrain its NNs 48 using the input data 26. It starts NN synthesis from asparse seed architecture 50. It first allows the network 48 to growconnections and neurons based on gradient information 52. Then, itprunes away insignificant connections and neurons based on magnitudeinformation 54 to drastically reduce model redundancy. This leads to afinal architecture 56 with improved accuracy and efficiency, where theformer is highly preferred on the server 36 and the latter is criticalat the edge 38. The training process 30 generates two inference models,i.e., DiabNN-server 32 and DiabNN-edge 34, for server 36 and edge 38inference, respectively. Both models share the same DiabNN architecture,but vary in the choice of internal NN layers based on different resourceconstraints and design objectives, to be explained later.

Model inference 40: Due to the distinct inference environmentsencountered upon deployment, DiabNNserver 32 and DiabNN-edge 34 requiredifferent input data flows, as depicted by the separate data paths inFIG. 2. In DiabNN-server 32, data have to be accumulated in local memory58, e.g., local phone/watch storage, before they can be transferred tothe base station 36 in a daily, weekly, or monthly manner, depending onuser preference. As opposed to the accumulation-and-inference process,DiabNN-edge 34 enables on-the-fly inference directly at the edge 38,e.g., a smartphone. This enables users to receive instant diagnosticdecisions 42 at output 44. As mentioned earlier, this incurs a slightaccuracy degradation (around 1%) due to the scarce energy and memorybudgets on the edge 38. However, this deficit may be alleviated whenDiabNN-edge 34 jointly works with DiabNN-server 32. When an alarm israised, DiabNN-edge 34 can store the relevant data sections asdisease-onset records (DORs) 60 that can be later transferred toDiabNN-server 32 for further analysis. In this manner, DiabNN-edge 34offers a substantial data storage reduction in the required edge memoryby bypassing the storage of ‘not-of-interest’ signal sections, whilepreserving the capability to make accurate inference on the server side.Such DORs 60 can also be used as informative references if futurephysician intervention and checkup are needed.

Diagnosis Output 44: DiabDeep 20 generates diagnosis 42 as output 44when DiabNN is fed the input data 26 from an individual.

The DiabNN Architecture

FIG. 3 shows the DiabNN architecture 62 that distills diagnosticdecisions from data inputs 64. There are three sequential steps employedduring this process: (i) data preprocessing 66, (ii) transformation viaNN layers 68, and (iii) output generation 70 (here using Softmax).

The preprocessing stage is critical for DiabNN 62 inference due to thefollowing functionality.

Data normalization: NNs typically favor normalized inputs. Normalizationmethods, such as min-max scaling, standardization, and L2 normalization,generally lead to accuracy and noise tolerance improvements. In thisembodiment, a min-max scaling is applied to scale each input data stream64 into the [0,1] range:

$\begin{matrix}{x_{normalized} = \frac{x - {\min(x)}}{{\max(x)} - {\min(x)}}} & (1)\end{matrix}$

Data alignment: WMS data streams 64 may vary in their start times andsampling frequencies. Therefore, data stream synchronization isguaranteed by checking their timestamps and applying appropriate offsetsaccordingly.

Different NN layers 68 are used in DiabNN 62 for server 32 and edge 34inference. DiabNN-server 32 deploys sparsely connected (SC) layers 72 toaim at high accuracy whereas DiabNN-edge 34 utilizes sparsely recurrent(SR) layers 74 to aim at extreme efficiency. All NN layers 68 aresubjected to dropout regularization, which is a widely-used approach foraddressing overfitting and improving accuracy.

In DiabNN-server 32, each SC layer 72 conducts a linear transformation(using a sparse matrix as opposed to a full matrix) followed by anonlinear activation function. As shown later, utilizing SC layers 72leads to more model parameters than SR layers 74, hence leading to animproved learning capability and higher accuracy. Consequentially,DiabNN-server 32 achieves a 1.1% accuracy improvement over DiabNN-edge34.

In DiabNN-edge 34, the SR layer 74 design is based on an H-LSTM cell 76.It is a variant of an LSTM cell obtained through the addition of hiddenlayers to its control gates 78. FIG. 4 shows a schematic diagram of anH-LSTM 76. Its internal computation flow is governed by the followingequations:

$\begin{matrix}{f_{t} = {\sigma\left( {{W_{f}^{s}{H^{*}\left( \left\lbrack {x_{t},h_{t - 1}} \right\rbrack \right)}} + b_{f}} \right)}} & (2) \\{i_{t} = {\sigma\left( {{W_{i}^{s}{H^{*}\left( \left\lbrack {x_{t},h_{t - 1}} \right\rbrack \right)}} + b_{o}} \right)}} & (3) \\{o_{t} = {\sigma\left( {{W_{o}^{s}{H^{*}\left( \left\lbrack {x_{t},h_{t - 1}} \right\rbrack \right)}} + b_{o}} \right)}} & (4) \\{g_{t} = {\tanh\left( {{W_{g}^{s}{H^{*}\left( \left\lbrack {x_{t},h_{t - 1}} \right\rbrack \right)}} + b_{g}} \right)}} & (5) \\{c_{t} = {{f_{t} \otimes c_{t - 1}} + {i_{t} \otimes g_{t}}}} & (6) \\{h_{t} = {o_{t} \otimes {\tanh\left( c_{t} \right)}}} & (7)\end{matrix}$

Here, f_(t), i_(t), o_(t), g_(t), x_(t), h_(t), and c_(t) denote theforget gate, input gate, output gate, cell update vector, input, hiddenstate, and cell state at step t, respectively. Further, h_(t-1) andc_(t-1) refer to the previous hidden and cell states at step t−1. Yetfurther, H, W^(s), b, σ, and ⊗ refer to a hidden layer that performs alinear transformation followed by an activation function, sparse weightmatrix, bias, sigmoid function, and element-wise multiplication,respectively. Yet further, * indicates zero or more H layers for each NNgate 78. The additional hidden layers enable three advantages. First,they enhance gate control through a multi-level hierarchy that can leadto accuracy gains. Second, they can be easily regularized throughdropout, and thus lead to better generalization. Third, they offer awide range of choices for internal activation functions, such as therectified linear unit (ReLU) as a nonlimiting example, that can lead tofaster learning. Using H-LSTM based SR layers 74, DiabNN-edge 34 reducesthe model size by 135.5× and inference FLOPs by 2.3× relative toDiabNN-server 32.

Grow and Prune Training 28 for DiabNN 62

The gradient-based network growth and magnitude-based network pruningalgorithms are next explained in detail. Unless otherwise stated, amask-based approach is assumed for tackling sparse networks. Each weightmatrix W has a corresponding binary mask matrix Msk that has the exactsame size. It is used to disregard dormant connections (connections withzero-valued weights).

The methodology in FIG. 5 illustrates the connection growth process. Themain objective of the weight growth phase is to locate only the mosteffective dormant connections to reduce the value of the loss functionL. To do so, the gradient for all the dormant connections are firstevaluated and this information is used as a metric for ranking theireffectiveness. During the training process, the gradient of all weightmatrices (W.grad) is extracted for each mini-batch of training datausing the back-propagation algorithm. This process is repeated over awhole training epoch to accumulate W.grad. Then, the average gradient iscalculated over the entire epoch by dividing the accumulated values bythe number of training instances. A dormant connection w is activated ifand only if its gradient magnitude is larger than the ((1−α)×100)^(th)percentile of the gradient magnitudes of its associated layer matrix.Its initial value is set to the product of its gradient value and thecurrent learning rate. The growth ratio α is a hyperparameter where0.1≤α≤0.3 is typically used here, though that is not intended to belimiting. The NN growth method has been shown to be very effective inenabling the network to reach a higher accuracy with far less redundancythan a fully connected model.

The connection pruning methodology is shown in FIG. 6. During thisprocess, a connection w is removed if and only if its magnitude issmaller than the (β×100)^(th) percentile of the weight magnitudes of itsassociated layer matrix. When pruned away, the connection's weight valueand its corresponding mask binary value are simultaneously set to zero.The pruning ratio β is also a hyperparameter where β≤0.3 is typicallyused here, though that is not intended to be limiting. Connectionpruning is an iterative process, where the network is retrained torecover its accuracy after each pruning iteration.

Evaluating Performance of an Embodiment of DiabDeep

The DiabDeep framework is implemented here using PyTorch on NvidiaGeForce GTX 1060 GPU (with 1.708 GHz frequency and 6 GB memory) andTesla P100 GPU (with 1.329 GHz frequency and 16 GB memory). CUDA 8.0 andCUDNN 5.1 libraries are employed in the following experiments. First thedataset collected from 39 participants that is used for DiabDeepevaluation is described. Then, the performance of DiabNN-server andDiabNN-edge is analyzed for two classification tasks: (i) binaryclassification that distinguishes between diabetic vs. healthyindividuals, and (ii) three-class classification to distinguish amongtype-1 diabetic, type-2 diabetic, and healthy individuals.

Data Collection and Preparation:

Here, the physiological data and demographic information from 39participants was collected. 14 participants were diagnosed with diabetes(11 with type-1 and three with type-2 diabetes) whereas the remaining 25participants were healthy non-diabetic baselines. The physiological datawas collected using a commercially available Empatica E4 smartwatch andSamsung Galaxy S4 smartphone, as shown in FIG. 7. A questionnaire wasalso used to gather demographic information from all the participants.All the data types collected are summarized in the table in FIG. 8.

During data collection, all the participants are first informed aboutthe experiment, are asked to sign a consent form, and are asked to fillthe demographic questionnaire. Then, the Empatica E4 smartwatch isplaced on the wrist of a participant's non-dominant hand, and theSamsung Galaxy S4 smartphone in the participant's pocket. The experimentlasts for approximately 1.5 hours in which the smartwatch and smartphonecontinuously track and store the physiological signals. The Empatica E4Connect portal is used for smartwatch data retrieval. An Androidapplication is developed to record all the smartphone sensor datastreams. All the data streams contain detailed timestamps that are laterused for data synchronization. The experimental procedure was approvedby the Institutional Review Board of Princeton University. None of theparticipants reported mental, cardiac, or endocrine disorders.

The dataset is preprocessed before training the model. The WMS datastreams are first synchronized and windowed. To avoid time correlationbetween adjacent data windows, the data is divided into t=15s windowswith s=30s shifts in between. The final dataset contains 4130 datainstances. 70%, 10%, and 20% of the data is used as training,validation, and test sets. The training, validation, and test sets haveno time overlap. The value ranges of the data streams are then extractedfrom the training set, and then all three datasets are scaled based onthe min-max scaling method, as explained earlier.

DiabNN-server performance evaluation:

The detailed experimental results for DiabNN-server are first presented.

Data input: For each data instance, the data is flattened andconcatenated within the same monitoring window from both the smartphoneand smartwatch. This results in a vector of length 3705, where theflattened smartwatch window contains 2535 signal readings (from one datastream at 64 Hz, three data streams at 32 Hz, two data streams at 4 Hz,and one data stream at 1 Hz), and the flattened smartphone windowprovides additional 1170 signal readings (from 26 data streams at 3 Hz).Finally, the seven demographic features are appended at its end and avector of length 3712 is obtained as the input for DiabNN-server.

Model architecture: Five sequential SC layers are used in DiabNN-serverwith widths set at 1024, 512, 256, 128, and 64, respectively. The inputdimension is the same as the input tensor dimension of 3712. ReLU isused as the nonlinear activation function for all SC layers.

Training: A stochastic gradient descent (SGD) optimizer with a momentumof 0.9 is used for this experiment. The learning rate is initialized to0.01 and the learning rate is divided by 10 when the validation accuracydoes not increase in 50 consecutive epochs. A batch size of 256 and adropout ratio of 0.2 is used. For grow-and-prune training, the seedarchitecture is initialized with a filling rate of 20%. The network isgrown for three epochs using a 0.2 growth ratio. For network pruning,the pruning ratio is initialized to 0.2. The pruning ratio is halved ifthe retrained model cannot restore accuracy on the validation set. Theprocess is terminated when the ratio falls below 0.01.

The table in FIG. 9 presents the confusion matrix of DiabNN-server forthe binary classification task. DiabNN-server achieves an overallaccuracy of 96.6%. For the healthy instances, it achieves a very lowfalse positive rate (FPR), which is the ratio of the number of falsepredictions of disease onset (positive) over the total number of healthyinstances (negative), of 3.5%, demonstrating its effectiveness inavoiding false alarms. For the diabetic instances, it achieves a falsenegative rate (FNR) of 3.2%, indicating its effectiveness in raisingalarms when diabetes does occur.

The confusion matrix of DiabNN-server for the three-class classificationtask is presented in the table in FIG. 10. DiabNN-server achieves anoverall accuracy of 96.2%. For the healthy instances, it achieves a lowFPR of 4.1%, again demonstrating its ability to avoid false alarms. Italso delivers low FNRs for both type-1 and type-2 diabetic individualsof 2.5% and 6.9%, respectively (each FNR depicts the ratio of the numberof false predictions for a target diabetes type divided by the totalnumber of instances of that type).

Furthermore, the grow-and-prune training paradigm not only delivers highdiagnostic accuracy, but also leads to model compactness as a sidebenefit. For binary classification, the final DiabNN-server modelcontains only 420.1K parameters with a sparsity level of 90.7%. For thethree-class classification task, the final DiabNN-server model containsonly 441.3K parameters with a sparsity level of 90.2%. The modelcompactness achieved in both cases can help reduce storage and energyconsumption on the server.

DiabNN-edge performance evaluation:

The performance of DiabNN-edge is next analyzed.

Data input: Unlike SC layer based DiabNN-server, SR layer basedDiabNN-edge acts on time series data step by step. Thus, at each timestep, the temporal signal values are concatenated from each data streamalong with the demographic information to form an input vector of length40 (corresponding to seven smartwatch data streams, 26 smartphone datastreams, and seven demographic features, as shown in FIG. 8).DiabNN-edge operates on four input vectors per second. When a signalreading is missing in a data stream (e.g., due to a lower samplingfrequency), the closest previous reading is used in that data stream asthe interpolated value.

Model architecture: DiabNN-edge contains one HLSTM cell based SR layerthat has a hidden state width of 96. Each control gate within the H-LSTMcell contains one hidden layer. ReLU is used as the nonlinear activationfunction.

Training: An SGD optimizer is used with a momentum of 0.9 for thisexperiment. The learning rate is initialized to 0.001. The learning rateis divided by 10 when the validation accuracy does not increase in 30consecutive epochs. A batch size of 64 and a dropout ratio of 0.2 isused for training. For grow-and-prune training, the same hyperparameterset as in the experiment for DiabNN-server is used.

The confusion matrix of DiabNN-edge for the binary classification taskis presented in the table in FIG. 11. DiabNN-edge achieves an overallaccuracy of 95.5%. For the healthy case, it also achieves a very low FPRof 3.1%. For diabetic instances, it achieves an FNR of 6.5%. This showsthat DiabNN-edge can also effectively raise disease alarms on the edge.

DiabNN-edge for the three-class classification task is also evaluatedand the confusion matrix is presented in the table in FIG. 12.DiabNN-edge achieves an overall accuracy of 95.3%. For the healthy case,it achieves an FPR of 3.1%. It achieves FNRs of 7.8% and 3.4% for thetype-1 and type-2 diabetic instances, respectively.

DiabNN-edge delivers extreme model compactness. For binaryclassification, the final DiabNN-edge model contains a sparsity level of96.5%, yielding a model with only 3.1K parameters. For the three-classclassification task, the final DiabNN-edge model contains a sparsitylevel of 96.3%, yielding a model with only 3.3K parameters. This greatlyassists with inference on the edge that typically suffers from verylimited resource budgets.

Results:

As mentioned earlier, DiabNN-edge and DiabNN-server offer severalperformance trade-offs over diagnostic accuracy, storage cost, andrun-time efficiency. This provides flexible design choices that canaccommodate varying design objectives related to model deployment. Toillustrate their differences, these two models are compared for thebinary classification task in the table in FIG. 13. It is observed thatDiabNN-server achieves a higher accuracy and a lower FNR. DiabNN-edge,on the other hand, caters to edge-side inference by enabling:

(1) A smaller model size: The edge model contains 135.5× fewerparameters, leading to a substantial memory reduction.

(2) Less computation: It requires 2.3× fewer FLOPs per inference,enabling a more efficient, hence more frequent, monitoring capability onthe edge.

(3) A lower FPR: It reduces the FPR by 0.3%. This enables fewer falsealarms and hence an improved usability for the system in a daily usagescenario.

Next, DiabNNs is compared with previously used learning methods,including SVMs with linear and RBF kernels, k-NN, random forest, andlinear ridge classifiers. For all the methods, the sametrain/validation/test split and the same binary classification task areused for a fair comparison. In line with previous studies, the signalmean, variance, Fourier transform coefficients, and the third-orderDaubechies wavelet transform approximation and detail coefficients onDaubechies D2, D4, D8, and D24 filters are extracted from eachmonitoring window, resulting in a feature vector of length 304 per datainstance. All the not-NN baselines are trained using the Python-basedScikit learn libraries. The performance of all the inference models iscompared in the table in FIG. 14. In addition to classificationaccuracy, the necessary FLOPs per inference involved in both featureextraction and classification stages are also computed. It can be seenthat DiabNN-server achieves the highest accuracy among all the models.With a higher accuracy than all the not-NN baselines, DiabNN-edgeachieves the smallest model size (up to 387.1× reduction) and leastFLOPs per inference (up to 8.1× reduction). Note that the featureextraction stage accounts for 491K FLOPs even before the classificationstage starts executing. This is already 1.3× the total inference cost ofDiabNN-edge.

DiabDeep is compared with previous work in the table in FIG. 15. Thesame binary classification task that is the focus of these studies isalso focused on. DiabDeep achieves the highest accuracy relative to thebaselines due to its two major advantages. First, it relies on a morecomprehensive set of WMSs. This captures a wider spectrum of usersignals in the daily context for diagnostic decisions. Moreover, itutilizes a grow-and-prune training paradigm that learns both theconnections and weights for DiabNNs. This enables a more effective SGDin both the model architecture space and parameter space.

Conclusion:

As such, generally disclosed herein are embodiments for a frameworkcalled DiabDeep that combines WMSs with efficient DiabNNs for continuousand pervasive diabetes diagnosis on both the server and the edge. On theresource-rich server, stacked SC layers are deployed to focus on highaccuracy. On the resource-scarce edge, an H-LSTM based SR layer is usedto reduce computation and storage costs with only a minor accuracy loss.DiabNNs are trained by leveraging gradient-based growth andmagnitude-based pruning methodologies. This enables DiabNNs to learnboth weights and connections during training. DiabDeep is evaluatedbased on data collected from 39 participants. Embodiments of thedisclosed system achieve a 96.6% (95.5%) accuracy in classifyingdiabetics against healthy individuals on the server (edge), and a 96.2%(95.3%) accuracy in distinguishing among type-1 diabetic, type-2diabetic, and healthy individuals. Against previous baselines, such asSVMs with linear and RBF kernels, k-NN, random forest, and linear ridgeclassifiers, DiabNN-edge reduces model size (FLOPs) by up to 387.1×(8.1×) while improving accuracy. Thus, it is demonstrated that DiabDeepcan be employed in a pervasive fashion, while offering high efficiencyand accuracy.

It is understood that the above-described embodiments are onlyillustrative of the application of the principles of the presentinvention. The present invention may be embodied in other specific formswithout departing from its spirit or essential characteristics. Allchanges that come within the meaning and range of equivalency of theclaims are to be embraced within their scope. Thus, while the presentinvention has been fully described above with particularity and detailin connection with what is presently deemed to be the most practical andpreferred embodiment of the invention, it will be apparent to those ofordinary skill in the art that numerous modifications may be madewithout departing from the principles and concepts of the invention asset forth in the claims.

1. A machine-learning based system for diabetes analysis, comprising oneor more processors configured to interact with a plurality of wearablemedical sensors (WMSs), the processors configured to: receivephysiological data from the WMSs and demographic data from a userinterface; train at least one neural network based on a grow-and-pruneparadigm to generate at least one diabetes inference model, the neuralnetwork to grow at least one of connections and neurons based ongradient information and to prune away at least one of connections andneurons based on magnitude information; and output a diabetes-baseddecision by inputting the received physiological data and demographicdata into the generated diabetes inference model.
 2. The system of claim1, wherein the physiological data comprises at least one of heart rate,body temperature, galvanic skin response, and blood volume pulse.
 3. Thesystem of claim 1, wherein the physiological data compriseselectromechanical data and ambient environmental data.
 4. The system ofclaim 1, wherein the demographic data comprises at least one of age,weight, gender, and height.
 5. The system of claim 1, wherein thegrowing at least one of connections and neurons based on gradientinformation comprises adding connection or neuron when its gradientmagnitude is greater than a predefined percentile of gradient magnitudesbased on a growth ratio.
 6. The system of claim 1, wherein the pruningaway at least one of connections and neurons based on magnitudeinformation comprises removing a connection or neuron when its magnitudeis less than a predefined percentile of magnitudes based on a pruningratio.
 7. (canceled)
 8. The system of claim 1, wherein the generatedinference model comprises at least one of a server-based inference modeland an edge-based inference model.
 9. (canceled)
 10. (canceled) 11.(canceled)
 12. (canceled)
 13. (canceled)
 14. (canceled)
 15. (canceled)16. (canceled)
 17. (canceled)
 18. (canceled)
 19. (canceled)
 20. Amachine-learning based method for diabetes analysis based on one or moreprocessors configured to interact with a plurality of wearable medicalsensors (WMSs), the method comprising: receiving physiological data fromthe WMSs and demographic data from a user interface; training at leastone neural network based on a grow-and-prune paradigm to generate atleast one diabetes inference model, the neural network to grow at leastone of connections and neurons based on gradient information and toprune away at least one of connections and neurons based on magnitudeinformation; and outputting a diabetes-based decision by inputting thereceived physiological data and demographic data into the generateddiabetes inference model.
 21. The method of claim 20, wherein thephysiological data comprises at least one of heart rate, bodytemperature, galvanic skin response, and blood volume pulse.
 22. Themethod of claim 20, wherein the physiological data compriseselectromechanical data and ambient environmental data.
 23. The method ofclaim 20, wherein the demographic data comprises at least one of age,weight, gender, and height.
 24. The method of claim 20, wherein thegrowing at least one of connections and neurons based on gradientinformation comprises adding connection or neuron when its gradientmagnitude is greater than a predefined percentile of gradient magnitudesbased on a growth ratio.
 25. The method of claim 20, wherein the pruningaway at least one of connections and neurons based on magnitudeinformation comprises removing a connection or neuron when its magnitudeis less than a predefined percentile of magnitudes based on a pruningratio.
 26. (canceled)
 27. (canceled)
 28. (canceled)
 29. (canceled) 30.(canceled)
 31. (canceled)
 32. (canceled)
 33. (canceled)
 34. (canceled)35. (canceled)
 36. (canceled)
 37. (canceled)
 38. (canceled)
 39. Anon-transitory computer-readable medium having stored thereon a computerprogram for execution by a processor configured to perform amachine-learning based method for diabetes analysis, the methodcomprising: receiving physiological data from one or more wearablemedical sensors (WMSs) and demographic data from a user interface;training at least one neural network based on a grow-and-prune paradigmto generate at least one diabetes inference model, the neural network togrow at least one of connections and neurons based on gradientinformation and to prune away at least one of connections and neuronsbased on magnitude information; and outputting a diabetes-based decisionby inputting the received physiological data and demographic data intothe generated diabetes inference model.
 40. The non-transitorycomputer-readable medium of claim 39, wherein the physiological datacomprises at least one of heart rate, body temperature, galvanic skinresponse, and blood volume pulse.
 41. The non-transitorycomputer-readable medium of claim 39, wherein the physiological datacomprises electromechanical data and ambient environmental data.
 42. Thenon-transitory computer-readable medium of claim 39, wherein thedemographic data comprises at least one of age, weight, gender, andheight.
 43. The non-transitory computer-readable medium of claim 39,wherein the growing at least one of connections and neurons based ongradient information comprises adding connection or neuron when itsgradient magnitude is greater than a predefined percentile of gradientmagnitudes based on a growth ratio.
 44. The non-transitorycomputer-readable medium of claim 39, wherein the pruning away at leastone of connections and neurons based on magnitude information comprisesremoving a connection or neuron when its magnitude is less than apredefined percentile of magnitudes based on a pruning ratio. 45.(canceled)
 46. The non-transitory computer-readable medium of claim 39,wherein the generated inference model comprises at least one of aserver-based inference model and an edge-based inference model. 47.(canceled)
 48. (canceled)
 49. (canceled)
 50. (canceled)
 51. (canceled)52. (canceled)
 53. (canceled)
 54. (canceled)
 55. (canceled) 56.(canceled)
 57. (canceled)